Overview

Dataset statistics

Number of variables10
Number of observations5695
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory467.2 KiB
Average record size in memory84.0 B

Variable types

Numeric10

Alerts

gross_revenue is highly correlated with invoiceno and 3 other fieldsHigh correlation
recencydays is highly correlated with invoicenoHigh correlation
invoiceno is highly correlated with gross_revenue and 4 other fieldsHigh correlation
quantity is highly correlated with gross_revenue and 3 other fieldsHigh correlation
frequency is highly correlated with invoicenoHigh correlation
qtd_return is highly correlated with invoicenoHigh correlation
avg_basket_size is highly correlated with gross_revenue and 2 other fieldsHigh correlation
avg_unique_basket_size is highly correlated with gross_revenue and 2 other fieldsHigh correlation
gross_revenue is highly correlated with invoiceno and 1 other fieldsHigh correlation
invoiceno is highly correlated with gross_revenue and 1 other fieldsHigh correlation
quantity is highly correlated with gross_revenue and 1 other fieldsHigh correlation
avg_ticket is highly correlated with qtd_returnHigh correlation
qtd_return is highly correlated with avg_ticketHigh correlation
avg_basket_size is highly correlated with avg_unique_basket_sizeHigh correlation
avg_unique_basket_size is highly correlated with avg_basket_sizeHigh correlation
gross_revenue is highly correlated with invoiceno and 2 other fieldsHigh correlation
invoiceno is highly correlated with gross_revenue and 1 other fieldsHigh correlation
quantity is highly correlated with gross_revenue and 1 other fieldsHigh correlation
frequency is highly correlated with invoicenoHigh correlation
avg_basket_size is highly correlated with gross_revenue and 1 other fieldsHigh correlation
customerid is highly correlated with recencydaysHigh correlation
gross_revenue is highly correlated with invoiceno and 3 other fieldsHigh correlation
recencydays is highly correlated with customeridHigh correlation
invoiceno is highly correlated with gross_revenue and 1 other fieldsHigh correlation
quantity is highly correlated with gross_revenue and 1 other fieldsHigh correlation
avg_ticket is highly correlated with gross_revenue and 1 other fieldsHigh correlation
qtd_return is highly correlated with gross_revenue and 1 other fieldsHigh correlation
avg_basket_size is highly correlated with avg_unique_basket_sizeHigh correlation
avg_unique_basket_size is highly correlated with avg_basket_sizeHigh correlation
gross_revenue is highly skewed (γ1 = 21.33147068) Skewed
avg_ticket is highly skewed (γ1 = 53.30430577) Skewed
qtd_return is highly skewed (γ1 = 51.46514451) Skewed
customerid has unique values Unique
qtd_return has 4143 (72.7%) zeros Zeros

Reproduction

Analysis started2022-09-20 10:01:24.384375
Analysis finished2022-09-20 10:03:02.019253
Duration1 minute and 37.63 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

customerid
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct5695
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31232.13942
Minimum12346
Maximum83709
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size66.7 KiB
2022-09-20T07:03:02.720900image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum12346
5-th percentile12699.1
Q114288.5
median16229
Q318210.5
95-th percentile82731.1
Maximum83709
Range71363
Interquartile range (IQR)3922

Descriptive statistics

Standard deviation28408.38395
Coefficient of variation (CV)0.909588151
Kurtosis-0.5185359982
Mean31232.13942
Median Absolute Deviation (MAD)1962
Skewness1.210180524
Sum177867034
Variance807036278.5
MonotonicityNot monotonic
2022-09-20T07:03:03.111907image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
178501
 
< 0.1%
163441
 
< 0.1%
129221
 
< 0.1%
820971
 
< 0.1%
165891
 
< 0.1%
137301
 
< 0.1%
168661
 
< 0.1%
820951
 
< 0.1%
820941
 
< 0.1%
820931
 
< 0.1%
Other values (5685)5685
99.8%
ValueCountFrequency (%)
123461
< 0.1%
123471
< 0.1%
123481
< 0.1%
123491
< 0.1%
123501
< 0.1%
123521
< 0.1%
123531
< 0.1%
123541
< 0.1%
123551
< 0.1%
123561
< 0.1%
ValueCountFrequency (%)
837091
< 0.1%
837081
< 0.1%
837071
< 0.1%
837061
< 0.1%
837051
< 0.1%
837041
< 0.1%
837001
< 0.1%
836991
< 0.1%
836961
< 0.1%
836951
< 0.1%

gross_revenue
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct5461
Distinct (%)95.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1862.766266
Minimum0.42
Maximum280206.02
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size89.0 KiB
2022-09-20T07:03:04.248897image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile13.635
Q1244.605
median639.89
Q31653.865
95-th percentile5522.757
Maximum280206.02
Range280205.6
Interquartile range (IQR)1409.26

Descriptive statistics

Standard deviation7963.897229
Coefficient of variation (CV)4.275306771
Kurtosis594.1773042
Mean1862.766266
Median Absolute Deviation (MAD)501.75
Skewness21.33147068
Sum10608453.88
Variance63423659.07
MonotonicityNot monotonic
2022-09-20T07:03:04.424900image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.959
 
0.2%
1.258
 
0.1%
4.958
 
0.1%
2.958
 
0.1%
12.757
 
0.1%
3.757
 
0.1%
1.657
 
0.1%
7.56
 
0.1%
5.956
 
0.1%
4.256
 
0.1%
Other values (5451)5623
98.7%
ValueCountFrequency (%)
0.421
 
< 0.1%
0.651
 
< 0.1%
0.791
 
< 0.1%
0.843
 
0.1%
0.853
 
0.1%
1.071
 
< 0.1%
1.258
0.1%
1.441
 
< 0.1%
1.657
0.1%
1.691
 
< 0.1%
ValueCountFrequency (%)
280206.021
< 0.1%
259657.31
< 0.1%
194550.791
< 0.1%
168472.51
< 0.1%
143825.061
< 0.1%
124914.531
< 0.1%
117379.631
< 0.1%
91062.381
< 0.1%
81024.841
< 0.1%
77183.61
< 0.1%

recencydays
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct304
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean116.7311677
Minimum0
Maximum373
Zeros38
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size89.0 KiB
2022-09-20T07:03:04.646774image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q122.5
median71
Q3199
95-th percentile337.3
Maximum373
Range373
Interquartile range (IQR)176.5

Descriptive statistics

Standard deviation111.5236412
Coefficient of variation (CV)0.955388723
Kurtosis-0.6387979375
Mean116.7311677
Median Absolute Deviation (MAD)61
Skewness0.8160444818
Sum664784
Variance12437.52255
MonotonicityNot monotonic
2022-09-20T07:03:04.835773image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1110
 
1.9%
4105
 
1.8%
398
 
1.7%
292
 
1.6%
1086
 
1.5%
882
 
1.4%
980
 
1.4%
1779
 
1.4%
778
 
1.4%
2265
 
1.1%
Other values (294)4820
84.6%
ValueCountFrequency (%)
038
 
0.7%
1110
1.9%
292
1.6%
398
1.7%
4105
1.8%
552
0.9%
778
1.4%
882
1.4%
980
1.4%
1086
1.5%
ValueCountFrequency (%)
37323
0.4%
37222
0.4%
37117
0.3%
3694
 
0.1%
36813
0.2%
36716
0.3%
36615
0.3%
36519
0.3%
36411
0.2%
3627
 
0.1%

invoiceno
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct59
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.49165935
Minimum1
Maximum210
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size89.0 KiB
2022-09-20T07:03:05.059808image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile11.3
Maximum210
Range209
Interquartile range (IQR)3

Descriptive statistics

Standard deviation6.868852155
Coefficient of variation (CV)1.96721715
Kurtosis308.5962454
Mean3.49165935
Median Absolute Deviation (MAD)0
Skewness13.32902721
Sum19885
Variance47.18112993
MonotonicityNot monotonic
2022-09-20T07:03:05.231810image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12855
50.1%
2831
 
14.6%
3508
 
8.9%
4386
 
6.8%
5243
 
4.3%
6172
 
3.0%
7143
 
2.5%
898
 
1.7%
968
 
1.2%
1054
 
0.9%
Other values (49)337
 
5.9%
ValueCountFrequency (%)
12855
50.1%
2831
 
14.6%
3508
 
8.9%
4386
 
6.8%
5243
 
4.3%
6172
 
3.0%
7143
 
2.5%
898
 
1.7%
968
 
1.2%
1054
 
0.9%
ValueCountFrequency (%)
2101
< 0.1%
2011
< 0.1%
1241
< 0.1%
971
< 0.1%
931
< 0.1%
911
< 0.1%
861
< 0.1%
741
< 0.1%
631
< 0.1%
621
< 0.1%

quantity
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct57
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.479543459
Minimum1
Maximum102
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size89.0 KiB
2022-09-20T07:03:05.428772image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q15
median9
Q313
95-th percentile20
Maximum102
Range101
Interquartile range (IQR)8

Descriptive statistics

Standard deviation6.638745978
Coefficient of variation (CV)0.7003233865
Kurtosis18.41590315
Mean9.479543459
Median Absolute Deviation (MAD)4
Skewness2.545936511
Sum53986
Variance44.07294816
MonotonicityNot monotonic
2022-09-20T07:03:05.622772image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7443
 
7.8%
9424
 
7.4%
1420
 
7.4%
10403
 
7.1%
6391
 
6.9%
8386
 
6.8%
5360
 
6.3%
11344
 
6.0%
4294
 
5.2%
12284
 
5.0%
Other values (47)1946
34.2%
ValueCountFrequency (%)
1420
7.4%
2250
4.4%
3248
4.4%
4294
5.2%
5360
6.3%
6391
6.9%
7443
7.8%
8386
6.8%
9424
7.4%
10403
7.1%
ValueCountFrequency (%)
1021
< 0.1%
821
< 0.1%
791
< 0.1%
741
< 0.1%
691
< 0.1%
601
< 0.1%
591
< 0.1%
581
< 0.1%
561
< 0.1%
541
< 0.1%

avg_ticket
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct5507
Distinct (%)96.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54.61760545
Minimum0.42
Maximum77183.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size89.0 KiB
2022-09-20T07:03:05.822772image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile3.465
Q18.510502308
median16.12
Q322.57508013
95-th percentile76.32
Maximum77183.6
Range77183.18
Interquartile range (IQR)14.06457782

Descriptive statistics

Standard deviation1281.098642
Coefficient of variation (CV)23.45578191
Kurtosis2955.510013
Mean54.61760545
Median Absolute Deviation (MAD)7.217
Skewness53.30430577
Sum311047.263
Variance1641213.73
MonotonicityNot monotonic
2022-09-20T07:03:06.015776image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.7511
 
0.2%
4.9510
 
0.2%
1.259
 
0.2%
2.959
 
0.2%
7.958
 
0.1%
8.257
 
0.1%
12.757
 
0.1%
1.657
 
0.1%
4.156
 
0.1%
3.356
 
0.1%
Other values (5497)5615
98.6%
ValueCountFrequency (%)
0.422
< 0.1%
0.5351
 
< 0.1%
0.651
 
< 0.1%
0.791
 
< 0.1%
0.83714285711
 
< 0.1%
0.842
< 0.1%
0.853
0.1%
1.0022222221
 
< 0.1%
1.021
 
< 0.1%
1.038751
 
< 0.1%
ValueCountFrequency (%)
77183.61
< 0.1%
56157.51
< 0.1%
13305.51
< 0.1%
4453.431
< 0.1%
38611
< 0.1%
30961
< 0.1%
2027.861
< 0.1%
1687.21
< 0.1%
1377.0777781
< 0.1%
1001.21
< 0.1%

frequency
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct1225
Distinct (%)21.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5475706259
Minimum0.005449591281
Maximum17
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size89.0 KiB
2022-09-20T07:03:06.225812image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.005449591281
5-th percentile0.01102941176
Q10.02492211838
median1
Q31
95-th percentile1
Maximum17
Range16.99455041
Interquartile range (IQR)0.9750778816

Descriptive statistics

Standard deviation0.5505967909
Coefficient of variation (CV)1.005526529
Kurtosis138.7856997
Mean0.5475706259
Median Absolute Deviation (MAD)0
Skewness4.851371477
Sum3118.414715
Variance0.3031568261
MonotonicityNot monotonic
2022-09-20T07:03:06.437776image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12879
50.6%
248
 
0.8%
0.062518
 
0.3%
0.0277777777817
 
0.3%
0.0238095238116
 
0.3%
0.0909090909115
 
0.3%
0.0833333333315
 
0.3%
0.0294117647114
 
0.2%
0.0344827586214
 
0.2%
0.0769230769213
 
0.2%
Other values (1215)2646
46.5%
ValueCountFrequency (%)
0.0054495912811
 
< 0.1%
0.0054644808741
 
< 0.1%
0.0054794520551
 
< 0.1%
0.0054945054951
 
< 0.1%
0.0055865921792
< 0.1%
0.0056022408961
 
< 0.1%
0.0056179775282
< 0.1%
0.005665722381
 
< 0.1%
0.0056818181822
< 0.1%
0.0056980056983
0.1%
ValueCountFrequency (%)
171
 
< 0.1%
41
 
< 0.1%
35
 
0.1%
248
 
0.8%
1.1428571431
 
< 0.1%
12879
50.6%
0.751
 
< 0.1%
0.66666666673
 
0.1%
0.5508021391
 
< 0.1%
0.53351206431
 
< 0.1%

qtd_return
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct219
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48.01720808
Minimum0
Maximum80995
Zeros4143
Zeros (%)72.7%
Negative0
Negative (%)0.0%
Memory size89.0 KiB
2022-09-20T07:03:06.637988image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile40
Maximum80995
Range80995
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1475.325676
Coefficient of variation (CV)30.72493664
Kurtosis2713.85495
Mean48.01720808
Median Absolute Deviation (MAD)0
Skewness51.46514451
Sum273458
Variance2176585.85
MonotonicityNot monotonic
2022-09-20T07:03:06.850018image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
04143
72.7%
1190
 
3.3%
2156
 
2.7%
3107
 
1.9%
490
 
1.6%
672
 
1.3%
564
 
1.1%
1249
 
0.9%
849
 
0.9%
748
 
0.8%
Other values (209)727
 
12.8%
ValueCountFrequency (%)
04143
72.7%
1190
 
3.3%
2156
 
2.7%
3107
 
1.9%
490
 
1.6%
564
 
1.1%
672
 
1.3%
748
 
0.8%
849
 
0.9%
938
 
0.7%
ValueCountFrequency (%)
809951
< 0.1%
742151
< 0.1%
93611
< 0.1%
90141
< 0.1%
80601
< 0.1%
46271
< 0.1%
37681
< 0.1%
33351
< 0.1%
29751
< 0.1%
21601
< 0.1%

avg_basket_size
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2375
Distinct (%)41.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.04377848893
Minimum1.347436502 × 10-5
Maximum1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size89.0 KiB
2022-09-20T07:03:07.052988image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1.347436502 × 10-5
5-th percentile0.001363210568
Q10.003448275862
median0.006622516556
Q30.0133742249
95-th percentile0.25
Maximum1
Range0.9999865256
Interquartile range (IQR)0.009925949037

Descriptive statistics

Standard deviation0.1516964889
Coefficient of variation (CV)3.465091934
Kurtosis28.77128418
Mean0.04377848893
Median Absolute Deviation (MAD)0.003846281132
Skewness5.278378743
Sum249.3184945
Variance0.02301182474
MonotonicityNot monotonic
2022-09-20T07:03:07.236983image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1110
 
1.9%
0.571
 
1.2%
0.333333333353
 
0.9%
0.2550
 
0.9%
0.236
 
0.6%
0.166666666728
 
0.5%
0.0833333333326
 
0.5%
0.0121
 
0.4%
0.0136986301421
 
0.4%
0.00943396226420
 
0.4%
Other values (2365)5259
92.3%
ValueCountFrequency (%)
1.347436502 × 10-51
< 0.1%
2.469227255 × 10-51
< 0.1%
7.067637289 × 10-51
< 0.1%
7.165376899 × 10-51
< 0.1%
0.00012781186091
< 0.1%
0.00016640781011
< 0.1%
0.00016767270291
< 0.1%
0.00019238168531
< 0.1%
0.00023255813951
< 0.1%
0.00023364485981
< 0.1%
ValueCountFrequency (%)
1110
1.9%
0.66666666671
 
< 0.1%
0.571
1.2%
0.333333333353
0.9%
0.31
 
< 0.1%
0.2550
0.9%
0.236
 
0.6%
0.18751
 
< 0.1%
0.17647058821
 
< 0.1%
0.166666666728
 
0.5%

avg_unique_basket_size
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1282
Distinct (%)22.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1326961552
Minimum0.0008976660682
Maximum1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size89.0 KiB
2022-09-20T07:03:07.422018image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.0008976660682
5-th percentile0.005704545455
Q10.02777777778
median0.05555555556
Q30.1111111111
95-th percentile0.6383116883
Maximum1
Range0.9991023339
Interquartile range (IQR)0.08333333333

Descriptive statistics

Standard deviation0.2213233687
Coefficient of variation (CV)1.667895866
Kurtosis8.64729026
Mean0.1326961552
Median Absolute Deviation (MAD)0.03514739229
Skewness3.021774538
Sum755.7046039
Variance0.04898403353
MonotonicityNot monotonic
2022-09-20T07:03:07.657987image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1271
 
4.8%
0.5162
 
2.8%
0.3333333333115
 
2.0%
0.07692307692103
 
1.8%
0.197
 
1.7%
0.111111111196
 
1.7%
0.2596
 
1.7%
0.295
 
1.7%
0.0714285714395
 
1.7%
0.0909090909194
 
1.7%
Other values (1272)4471
78.5%
ValueCountFrequency (%)
0.00089766606821
< 0.1%
0.0013351134851
< 0.1%
0.0013679890561
< 0.1%
0.0013869625521
< 0.1%
0.0014184397161
< 0.1%
0.0014556040761
< 0.1%
0.0014792899411
< 0.1%
0.0014814814811
< 0.1%
0.0015105740181
< 0.1%
0.0015337423311
< 0.1%
ValueCountFrequency (%)
1271
4.8%
0.83333333331
 
< 0.1%
0.81
 
< 0.1%
0.752
 
< 0.1%
0.66666666679
 
0.2%
0.64285714291
 
< 0.1%
0.63636363641
 
< 0.1%
0.64
 
0.1%
0.54545454551
 
< 0.1%
0.52631578951
 
< 0.1%

Interactions

2022-09-20T07:02:59.173641image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:16.284490image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:43.331573image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:45.172902image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:46.994902image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:48.705902image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:50.455627image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:52.478570image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:54.898943image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:57.152919image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:59.378672image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:17.631065image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:43.523572image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:45.370860image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:47.170868image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:48.905869image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:50.661613image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:52.693583image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:55.152920image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:57.374960image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:59.582710image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:18.161066image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:43.699606image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:45.554902image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:47.336858image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:49.079929image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:50.850594image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:52.882569image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:55.368930image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:57.578965image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:59.840688image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:18.782040image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:43.887565image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:45.738868image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:47.508902image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:49.256943image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:51.051573image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:53.087569image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:55.601934image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:57.814548image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:03:00.098688image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:19.275504image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:44.056572image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:45.908867image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:47.660903image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:49.414948image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:51.229570image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:53.295916image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:55.797961image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:57.989585image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:03:00.318717image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:19.810531image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:44.225607image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:46.084874image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:47.819873image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:49.589583image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:51.412570image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:53.572915image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:56.000932image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:58.175582image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:03:00.534686image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:42.580606image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:44.416613image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:46.273901image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:48.001872image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:49.770612image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:51.623613image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:53.951917image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:56.247941image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:58.370583image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:03:00.726674image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:42.768613image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:44.602612image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:46.459902image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:48.176859image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:49.943569image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:51.823568image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:54.182970image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:56.489970image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:58.556240image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:03:00.937685image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:42.970588image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:44.824903image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:46.644905image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:48.358902image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:50.125613image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:52.061567image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:54.419916image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:56.730931image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:58.773614image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:03:01.118685image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:43.150578image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:44.996857image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:46.819907image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:48.532885image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:50.288589image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:52.274584image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:54.660916image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:56.948914image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-20T07:02:58.971584image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-09-20T07:03:07.852990image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-20T07:03:08.116019image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-20T07:03:08.384022image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-20T07:03:08.638019image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-20T07:03:01.397718image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-20T07:03:01.756685image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

customeridgross_revenuerecencydaysinvoicenoquantityavg_ticketfrequencyqtd_returnavg_basket_sizeavg_unique_basket_size
0178505391.2137234618.15222217.00000040.00.0196190.114478
1130473237.5431101218.8229070.02830236.00.0071890.058140
2125837281.382152529.4792710.04032351.00.0029640.060729
313748948.25955833.8660710.0179210.00.0113900.178571
415100876.0033332292.0000000.07317122.00.0375001.000000
5152914668.3025151745.3233010.04011529.00.0071330.145631
6146885630.877212417.2197860.057221399.00.0058000.064220
7178095411.9116122388.7198360.03352042.00.0058340.196721
81531160767.900914325.5434640.243316474.00.0023830.038251
9160982005.638771529.9347760.0243900.00.0114190.104478

Last rows

customeridgross_revenuerecencydaysinvoicenoquantityavg_ticketfrequencyqtd_returnavg_basket_sizeavg_unique_basket_size
5685837004839.42112578.0551611.00.00.0009310.016129
568613298360.00112180.0000001.00.00.0104170.500000
568714569227.3911518.9491671.00.00.0126580.083333
56888370417.901112.5571431.00.00.0714290.142857
5689837053.351111.6750001.00.00.5000000.500000
5690837066637.59112310.4528981.00.00.0005720.001575
5691837077689.23012210.5187821.00.00.0004970.001368
5692837083217.20012054.5288141.00.00.0015290.016949
5693837095664.89011625.9857341.00.00.0013660.004587
569412713848.5501922.3302631.00.00.0019690.026316